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Abstract 

The classical (Turing) theory of computation has been extraordinarily successful in providing the 
foundations and framework for theoretical computer science. Yet its dependence on 0's and I's is 
fundamentally inadequate for providing such a foundation for modern scientific computation 
where most algorithms --with origins in Newton, Euler, Gauss, et. al. -- are real number 
algorithms. 


In 1989, Mike Shub, Steve Smale and I introduced a theory of computation and complexity over 
an arbitrary ring or field R [BSS89]. If Ris Z7.=({0, 1}, +, -), the classical computer science 
theory is recovered. If R is the field of real numbers X, Newton’s algorithm, the paradigm 
algorithm of numerical analysis, fits naturally into our model of computation. 


Complexity classes P, NP and the fundamental question “Does P = NP?” can be formulated 
naturally over an arbitrary ring R. The answer to the fundamental question depends in general on 
the complexity of deciding feasibility of polynomial systems over R. When R is Z2, this becomes 
the classical satisfiability problem of Cook- Levin [Cook71, Levin73]. When R is the field of 
complex numbers C, the answer depends on the complexity of Hilbert’s Nullstellensatz. 


' This paper is based on the AWM Noether Lecture I gave at the Joint Mathematics Meetings in San Diego, 
January, 2001. When I started to write up my talk, I came across a paper, “La diversite des mathematiques 
face a un probleme de logique,” by Alain Yger based on an expository talk he gave at the journées IREM 
d'Aquitaine, June 20, 2001. (An English translation can be downloaded from his website: 
http://(www.math.u-bordeaux.fr/~yger/exposes.html [Yger01]). Here Yger discusses in greater detail 
algebraic aspects of the P = NP? problem over ©. At the time it seemed reasonable to combine the two 
papers, but ultimately I decided to focus more on my original lecture theme of the two traditions of 
computing. There is overlap also with Felipe Cucker’s “Three lectures on real computation,” based on talks 
he gave in Kaikoura, New Zealand in January, 2000 [Cucker01] and earlier expositions of my own 
[Blum90, Blum91] and of Steve Smale’s [Smale90, Smale97]. 

? Supported in part by NSF grant # CCR-0122581. I am grateful to the Toyota Technological Institute at 
Chicago for the gift of space-time to write this paper. 
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The notion of reduction between problems (e.g. between traveling salesman and satisfiability) has 
been a powerful tool in classical complexity theory. But now, in addition, the transfer of 
complexity results from one domain to another becomes a real possibility. For example, we can 
ask: Suppose we can show P = NP over © (using all the mathematics that is natural here). Then, 
can we conclude that P = NP over another field such as the algebraic numbers, or even over 72? 
(Answer: Yes and essentially yes.) 


In this paper, I discuss these results and indicate how basic notions from numerical analysis such 
as condition, round-off and approximation are being introduced into complexity theory, bringing 
together ideas germinating from the real calculus of Newton and the discrete computation of 
computer science. The canonical reference for this material is the book, Complexity and Real 
Computation [BCSS98]. 


‘ Í ` 
Complexity and Real Computation and authors Mike Shub, Lenore Blum, Felipe Cucker, 
Steve Smale. (Photo taken by Victor Pan at Dagsthul, 1995.) 


1. Two Traditions of Computation 


The two major traditions of the theory of computation have, for the most part, run a 
parallel non intersecting course. On the one hand, we have numerical analysis and 
scientific computation; on the other hand, we have the tradition of computation theory 
arising from logic and computer science. 


Fundamental to both traditions is the notion of algorithm. Newton’s method is the 
paradigm example of an algorithm cited most often in numerical analysis texts. The 
Turing machine is the underlying model of computation given in most computer science 
texts on algorithms. Yet, Newton’s method is not discussed in these computer science 
texts (e.g. [CLRSO1]), nor are Turing machines mentioned in texts on numerical analysis 
(e.g. [StoerBurlirsch02]). 


More fundamental differences arise with the distinct underlying spaces, the mathematics 
employed and problems tackled by each tradition. In numerical analysis and scientific 
computation, algorithms are generally defined are over the reals or complex numbers and 
the relevant mathematics is that of the continuum. On the other hand, 0’s and 1’s are the 
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basic bits of the theory of computation of computer science and the mathematics 
employed is generally discrete. The problems of numerical analysts tend to come from 
the classical tradition of equation solving and the calculus. Those of the computer 
scientist tend to have more recent combinatorial origins. The highly developed theory of 
computation and complexity theory of computer science in general is unnatural for 
analyzing problems arising in numerical analysis, yet no comparable formal theory has 
emanated from the latter. 


One aim of our work is to reconcile the dissonance between these two traditions, perhaps 
to unify, but most importantly to see how perspectives and tools of each can inform the 
other. 


Numerical Analysis/ Scientific Logic/Computer Science 
Computation 

* Newton’s Method, Paradigm * Turing Machine, Underlying 
Example in Most Numerical Model in most Computer Science 
Analysis Texts Texts on Algorithms 

- No Mention of Turing Machines* * No Mention of Newton’s Method 

* Real & Complex #’s > 0's & 1’s (bits)* 

* Math is Continuous * Math is discrete 

* Problems are Classical * Problems are Newer 

* Major conference: FoCM * Major conference: FOCS 
(Foundation of Computational (Foundation of Computer 
Mathematics) Science) 

By * No formal model of computation ey * Everything coded by bits, 
or systematic Complexity Theory unnatural for problems of numerical 
analysis. 


We begin with some background (section 2) and motivation (section 3), then we present 
our unifying model (section 4) and main complexity results (section 5) and finally, we see 
Turing meet Newton and fundamental links introduced (section 6). 


2. Background 


The motivation for logicians to develop a theory of computation in the 1930’s had little to 
do with computers (think of it, aside from historical artifacts, there were no computers 
around then). Rather, Gödel, Turing et. al. were groping with the question: “What does it 
mean for a problem or set S (c universe X) to be decidable?” For example, how can one 
make precise Hilbert’s Tenth Problem? 
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Example. Hilbert’s Tenth Problem [Hilbert1900]. 

Let X = {f e 7[x),...,Xn], n > 0} and S= {fe X] 3 6 e 2", f(C1,...,Gn) = 0}. 

Is S decidable? That is, can one decide by finite means, given a diophantine polynomial, 
whether or not it has an integer solution? (Actually, Hilbert’s challenge was: produce 
such a decision procedure.) 


The logicians’ subsequent formalization of the notion of decidability has had profound 
consequences. 


Definition. A set S (c X) is decidable if its characteristic function ys (with values 1 on 
S, 0 on X - S) is computable (in finite time) by a machine. 


Such a 0-1 valued machine is called a decision procedure for S. On input x €X it answers 
the question “Is xe S?” with output 1 if YES and 0 if NO. Here X is &*, the set of finite 
but unbounded sequences over a finite set £. We will also allow X to be a decidable 
subset of &* (e.g. as would be the case for the set of all diophantine polynomials 
embedded in {0,1}* via some natural coding). N.B. X* is countable. 


To complete the definition, many seemingly different machines were proposed. What has 
been striking is that all gave rise to the exact same class of “computable” functions. Thus, 
the class of input-output maps of Turing machines are exactly the computable functions 
as derived from Post production systems, as from flow-chart machines, as from random 
access machines, as from axiomatic descriptions of recursive functions, etc.. This gives 
rise to the belief, known as Church’s Thesis, that the computable functions form a natural 
class and any informal notion of procedure or algorithm can be realized within any of the 
formal settings. 


<Lıloli loli folololololojojol) 


The Turing Machine (with simple operations, finite number of states, finite program and 
infinite tape) provides a mathematical foundation for the classical Theory of Computation 
(originated by logicians in the 1930’s and 40’s) and Complexity Theory (originated by 
computer scientists in the 1960’s and 70’s). The program of this 2-state TM instructs the 
machine “if in state A with a 0 or 1 in the current scanned cell, then print a 1, move R(ight) 
and go into state B; if in state B, print 0, move R and go into state A.” That the long term 
behavior of this machine discernible is not the norm. (TM picture, courtesy of Bryan Clair.) 


In 1970, Yuri Matijasevich answered Hilbert’s Tenth Problem in the negative by showing 
the associated characteristic function ys is not Turing computable and hence, by Church’s 
Thesis, no procedure exists for deciding the solvability in integers of diophantine 
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equations. (Following the program mapped out earlier by Davis, Putnam and Julia 
Robinson, Matijasevich needed only to show the existence of a diophantine relation of 
exponential growth [Matijasevich70, DavisMatijasevicRobinson76, Blum97].) 


Since the initial startling results of the 1930’s of the undecidability of the true statements 
of arithmetic and the halting problem for Turing machines, logicians have focused 
attention on classifying problems as to their decidability or undecidability. Considerable 
attention was placed on studying the hierarchy of the undecidable. Decidable problems, 
particularly finite ones, held little interest. In the 1960’s and 70’s, computer scientists 
started to realize that not all such decidable problems were alike, indeed some seemed to 
defy feasible solution by machine. 


Example. The SATisfiability Problem (SAT). Here, 
X= {f | f: Z2" — Z2} is the set of Boolean functions and 
S={fe X|3 6 e Z2”, f(C1,...,Gn) = OF. 


It is assumed X is embedded in {0, 1}* via some natural encoding. Systematically testing 
all possible 2" arguments for a given Boolean function f clearly yields a decision 
procedure for S. This procedure takes an exponential number of basic operations in the 
worst case. We do not know if SAT is tractable, i.e. if there a polynomial time decision 
procedure for S. 


Definition. The decision problem (X, S) is in class P if the characteristic function %s is 
polynomial time computable, i.e. computable by a polynomial time Turing machine. A 
polynomial time Turing machine is one that halts in c(size x)“ Turing operations for some 
fixed c, k = 0 and all inputs xe X. Here size x is the length of the sequence x, i.e. the bit 
length if X= {0,1}. 


As was the case for computable functions, the polynomial time functions, the class P, and 
subsequently defined complexity classes, form natural classes independent of machine.” 
Thus again, computer scientists have confidence they are working with a very natural 
class of functions and feel justified employing their favorite model. 


In the early 1970’s Steve Cook and Leonid Levin [Cook71, Levin73] independently 
formulated and answered the question about the tractability of SAT with another 
question: Does P = NP?* 


A decision problem (X, S) € class NP if for each x e S there is a polynomial time 
verification of this fact. We will formalize the definition of NP in section 5, but meantime 
we note that SAT € NP: If a Boolean function f € S, then there is a witness 6 € >" that 
provides, together with the computation of f on argument 6 = (C),...,¢n) producing value 
0, a polynomial time verification. If f ¢ S, then no such witness will do. 


? A main proviso is that integers are not coded in unary. 
4 I am rephrasing their result. The usual statement is: SAT is NP-complete. 
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The significance of the P = NP? question became clear when Dick Karp showed that the 
tractability of each of twenty-one seemingly unrelated problems was equivalent to the 
tractability of SAT [Karp72]. The number of such problems known today is legion. 


(= 


Leonid Levin, Steve Cook, and Dick Karp. 


4 ; 
As did the earlier questions about decidability, the P = NP? question has had profound 
consequence. It is one of the most famous unsolved problems today, both in computer 

science and in mathematics’. The apparent dichotomy between the classes P and NP has 
been the underpinning of some of the most important applications of complexity theory, 


such as to cryptography and secure communication. Particularly appealing here is the 
idea of using hard problems to our advantage. 


Thus, the classical Turing tradition has yielded a highly developed and rich (invariant) 
theory of computation and complexity with essential applications to computation ---and 
deep interesting questions. Why do we want a new model of computation? 


3. Motivation for Model 


3.1. Decidability over the Continuum 
Is the Mandelbrot Set Decidable? 


Now we witnessed, a certain extraordinarily complicated-looking set, namely the 
Mandelbrot set. Although the rules which provide its definition are surprisingly 
simple®, the set itself exhibits an endless variety of highly elaborate structure. 
Could this be an example of a non-recursive [i.e. undecidable] set, truly exhibited 
before our mortal eyes?! Roger Penrose, The Emperor’s New Mind [Penrose89]. 


` P = NP? is one of the million dollar problems posed by the Clay Mathematics Institute [CMI2000]. 

° The Mandelbrot set M is the set of all parameters ce C, such that the orbit of 0 under the quadratic map 
p(z) = z? + c remains bounded. 

’ Emphasis, mine. 
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bounded}. On the right we have a view of the elaborate structure on boundary of M in 
Seahorse Valley. 


Classically, decidability is defined only for countable sets. The Mandelbrot set is 
uncountable. So, the Mandelbrot set is not decidable classically. But clearly this is not a 
satisfactory argument. So, how do we reasonably address Penrose’s question? 


From time to time, logicians and computer scientists do look at problems over the reals or 
complex numbers. One approach has been through “recursive analysis” which has its 
origins in Turing’s seminal 1936 paper [Turing36].* In the first paragraph, Turing defines 
“a number [to be] computable if its decimal can be written down by a machine.” Turing 
later defines computable functions of computable real numbers.’ One can then imagine 
an oracle Turing machine that, when fed a real number by oracle, decimal by decimal, 
outputs a real number decimal by decimal. A refinement of this notion forms the basis of 
recursive analysis [Pour-elRichards89]. This approach to the study of algorithms over the 
reals differs greatly from the way numerical analysts present real number algorithms. 


ORACLE TURING MACHINES 


Another tack, taken by computer scientists is what might be called the “rational number model” 
approach. The approach is not formalized, but its reasoning goes as follows: 


A. Machines are finite. 
B. Finite approximations of reals are rationals. 
C. .. We are really looking at problems over the rationals. 


If we are totally naive here, we quickly run into trouble. The rational skeleton of the 


curve x? +y° = 1 on the positive quadrant is hardly informative.'° 


* Here Turing first defines automatic machines and shows the undecidability of the halting problem. 

* Although Turing states in this paper that “a development of the theory of functions of a real variable 
expressed in terms of computable numbers” would be forthcoming, I am not aware of any further such 
development by him in this direction. 

1 By Fermat, the only rational points on the curve are (0,1) and (1,0). 
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Cy) 


1 
RATIONAL Points 
oy X*+ y>=4 .X,¥ 20 


We have even more serious concerns when this approach is used in complexity theory. 
Computer scientists measure complexity as a function of input word size (in bits). But, 
small perturbations (of input) can cause large differences in word size. For example, a 
small perturbation of an input 1 to 1 + 4" causes the word size to grow from 1 ton + 1. 
Thus, an algorithm that is polynomial time according to the discrete model definition 
would be allowed to take considerably more time on a perturbed input than on a given 
input. If the problem instance were well conditioned, this clearly would not be 
acceptable. The issue here is that the Euclidean metric is very different than the bit 
metric. 


Not paying attention to the distinction has caused both incompleteness in the analysis, 
and confusion in the comparison, of different algorithms over the reals. The comparison 
of competing algorithms for the Linear Programming Problem provides a case in point. 
We shall return to this example again (in section 6). 


Penrose explores similar scenarios for posing his question but in the end “... is left with 
the strong feeling that the correct viewpoint has not yet been arrived at.” 


The Mandelbrot example is perhaps too exotic to draw generalizations. We turn now to a 
decision problem ubiquitous in mathematics. 


The Hilbert NullstellensatzR. 

Given a system of polynomial equations over a ring R, 
fix,- Xn) = 0,..., fin 1,..-5 Xn) = 0. 

Is there € e R”, such that f)(C) =... = fn(C) = 0? 


We call the corresponding decision problem over R, HNpr. If R is “2, Z, or the rational 
numbers ©, HN fits naturally into the Turing formalism (via bit coding of integers and 
rationals). The corresponding decision problem over “2 is essentially SAT (decidable but 
not known to be in P), over ⁄ it is Hilbert’s Tenth Problem (undecidable) and over Q (not 
known to be decidable or undecidable). If R is the real field 1 or complex numbers C, 
then HNe does not fit naturally into the Turing formalism. 


An even simpler example is the high school algorithm for deciding whether or not a real 
polynomial ax” + bx + c has a real root. We just check if the discriminant b? — 4ac > 0. 
We do not stipulate that a, b, and c be rational, or are fed to us bit by bit ---or question if 
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we can tell whether or not a real number equals 0. We just work with the basic arithmetic 
operations and comparisons of an ordered ring or field. 


More generally, we have perfectly good algorithms for deciding HNr over 1 and 
Recall, 


Hilbert’s Nullstellensatz, HN [Hilbert1893]. 

Given fi(x1,..., Xp) coche (Migs Xn) € C[x1,..., Xn]. Then, 

f=... = fm = 0 is not solvable over € < (*) 1 = Èg;fi, for some polynomials g; € 
( are one) Xn]. 


This theorem provides a semi-decision procedure for the complement of HNc: Given 
fi,...,fm € C[x1,... Xn], Systematically search for g;’s to solve (*). If found, then fı =... = 
fm= 0 is not solvable over C and so output 0. (The search can be done by considering, for 
each successive D, general polynomials g; of degree D with indeterminate coefficients. 
Checking if there exists coefficients satisfying (*) reduces to solving ~ D" linear 
equations over C.) 


However, if fi = ... = fm= 0 is solvable, then no such g;’s will ever be found. Fortunately, 
we have stopping rules. In 1926, Grete Hermann, a student of Emmy Noether, gave an 
effective upper bound D (= dT(2") where d = max (3, deg fj)) on the degrees of the g;’s 
that one need consider [Hermann26].'' If no solution is found for generic g;’s of degree 
D, then none exists and so we can output 1. 


4 


ji 


"p 


David Hilbert, Emmy Noether, and Grete Henry-Hermann. 


This decision procedure, using arithmetic operations and comparisons on complex 
numbers, inspires us to take a different tack. Rather than forcing artificial coding of 
problems into bits, we propose a model of computation that computes over a ring (or 
field), using basic algebraic operations and comparisons on elements of the ring (or 
field). Thus, we have our first motivation for proposing a new model. 


1! Results of Brownawell [Brownawell87] and Kollar [Kollar88] imply that we need only check the case D 
= d", which by Masser and Philippon is optimal. See [Yger01] and [KPS01] for more discussion and 
refinements. 
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3.2. Algorithms of Numerical Analysis 

Our next motivation comes directly from the tradition of numerical analysis. We start 
with the paper, “Rounding-off Errors in Matrix Processes” in The Quarterly Journal of 
Mechanics and Applied Mathematics, vol. 1, 1948, pp. 287-308 [Turing48]. Written by 
Alan Turing while he was preoccupied with solving large systems of equations, this 
paper is quite well-known to numerical analysts, almost unknown by logicians and 
computer scientists. Its implicit model of computation is more closely related to the 
former than the latter. 


In the first section of his paper, Turing considers the “measures of work in a process”: 


It is convenient to have a measure of the amount of work involved in a 
computing process, even though it be a very crude one...We might, for 
instance, count the number of additions, subtractions, multiplications, divisions, 
recordings of numbers,... 12 


From this point of view, it is again natural to start with a model of computation where 
real numbers are viewed as entities and algebraic operations and comparisons, as well as 
simple accessing, are each counted as a unit of work. We will return again to this paper to 
motivate refinements of these initial “measures of work.” 


a 
THE QUARTERLY JOURNAL OF 
MECHANICS AND 
APPLIED 
MATHEMATICS 


Kediveriat Board 
pirex RY. sovraw 


OXFORD 
AT THE CLARENDON PRESS 


PROPERTY OF 
CARNEGIE INSTITUTE OF TECHNOLOGY 
LIRRARY 


The Quarterly Journal of Mechanics and Applied Mathematics, vol. I, 1948. 


We also want a model of computation which is more natural for describing algorithms of 
numerical analysis, such as Newton's method for finding zeroes of polynomials. Here, 
given a polynomial f(z) over © or C, the basic operation is the Newton map, Nz) = z — 
f(z)/f (z), which is iterated until the current value satisfies some proscribed stopping rule. 
Translating to bit operations would wipe out the natural structure of this algorithm. 


1g Emphasis, mine. 


10 
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INPUT: 


COMPUTE: 

BRANCH : STOPPING 
Heee M 

OUTPUT: 


A Newton machine for executing Newton’s method. 


3.3. Does P = NP? 

In order to gain new perspective and access additional tools, mathematicians often find it 
profitable to view problems in a broader framework than originally posed. We are thus 
motivated to follow this path for the P = NP? problem. 


4. The Model: Machines over a Ring or Field R [BSS89] 


Ineut Space RO- wR R 


A Machine over a Ring or Field R, top level and internal views. 


We suppose R is a commutative ring or field (possibly ordered) with unit. A machine M 
over R has the following properties: 


Associated with M is an input space and an output space, both R” (the disjoint union of 
R”, n 2 0). At the top level, our machine M is similar to a Turing machine. M has a 2-way 
infinite tape divided into cells and a read-write head that can view a fixed number km of 
contiguous cells at a time. 


Internal to M is its program, a finite directed graph with 5 types of nodes, each with 
associated operations and next node maps: 


11 
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e The operation g, associated with the input node ı takes elements x = (Xj, ..., Xx) from the 
input space R” and puts each x;(i= 1, ..., k) in successive tape cells, starting with the 
leftmost one in M’s view. There is a unique next node v’ #1. 


e Each computation node n has a built in polynomial or rational map g,: R" > R” with n, 


m < ky. Given elements X1, --- Xn In the first n cells of M’s view, the associated operation, 
also called gn, puts g,/(x1, ..., Xn) in the jth cell in M’s view (j = 1,.., m). There is a unique 
next node n’. 


e For each branch node n, the associated operation is the identity. There are two possible 
next nodes n’, and n’r depending on the leftmost element x; in M’s view. If x; = 0 (20, if R 
is ordered), then n?’ = n’r . If x; #0 (x; < 0), then n’ = n’r. 


e For each shift node o, the associated map is the identity and there is a unique next node 
©’. Right shift nodes og shift M’s view one cell to the right, left shift nodes oy shift one cell 
to the left. 


e The operation gyassociated with the output node N outputs (by projection) the contents 
of the tape into R”. N has no next node. 


The computable functions over R are the input-output maps m of machines M over R. 
Thus, for x € R”, d(x) is defined if the output node N is reachable by following M’s 
program on input x. If so, d(x) is the output y € R”. 


Although the machine’s view at any time is finite, the shift nodes enable the machine to 
read and operate on inputs from R” for all n, and thus we can model algorithms that are 
defined uniformly for inputs of any dimension. Noting this, we can also construct 
universal (programmable) machines over R. (We do not use Gédel coding. The machine 
program itself is (essentially) its own code.) 


If R is “2, we recover the classical theory of computation (and complexity, as we shall 
see). We also note that Newton’s method is naturally implemented by a machine over x. 


Let’s return now to questions of decidability over 1 and C which were so problematic 
before. 


Definition. A problem over R is a pair, (X, Xyes), where Xyes X c R”. X consists of the 
problem instances, Xyes, the yes-instances. 


For HNp, 
X= {f=(fi,...,fn)| fi € R[ki,...,Xn], m, n> 0. 
Xyes = {f eX] 3 6 € R”, fi(C1,..., Gn) = 0, i=1,...,m}. 


13 Machines over R can thus have a finite number of built in constants from R. 


12 
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Finite polynomial systems over R can be coded as elements of R” (by systematically 
listing coefficients); thus X can be viewed as a subspace of R”. 


Definition. A problem over R, (X, Xyes), is decidable if the characteristic function of Xyes 
in X is computable over R. 


Thus, in this framework, we can state problems over | and C (or any ring or field) and 
ask questions of decidability. The algorithm presented earlier, based on Hilbert’s 
Nullstellensatz with effective bounds, is easily converted to a decision machine over 
and so HN: is decidable over C. Similarly, HN» is decidable over 1? (by Seidenberg’s 
elimination theory [Seid54]). 


We can also now formally state, and answer, Penrose’s question about the Mandelbrot set 
M. Here X is? and Xyes is M. (In order to allow algorithms that compare magnitudes, 
we are viewing C as 12^.) 


Theorem [BlumSmale93]. The Mandelbrot set M is undecidable over 12. 


The proof uses the fact that the boundary of a closed semi-decidable set in F? has 
Hausdorff dimension at most 1, whereas the Hausdorff dimension of the boundary of the 
Mandelbrot set is 2 [Shishikura9 1]. 


It turns out that the complement of the Mandelbrot set M is semi-decidable over... To 
see this, a little arithmetic shows that M = {c| ce C such that the sequence c, ce +¢, (° + 
c)? +c, ... stays within the circle of radius 2}. Hence, if at some point the orbit of 0 
under the map z% + c escapes the circle of radius 2, we can be certain that c is in the 
complement of M. One can use this fact to “draw” the Mandelbrot set. 


© , large integer } 
2, rea + 3 ia 


“Compute: 


«BRANCH: 


nzN 


ESCAPE 
sOVTPUT: in | TIME (nne) 
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Recent work of Arbieto and Matheus present further results in this direction 
[ArbietoMatheus03]. For example, consider the quadratic family of maps Q,(x) = = 1 — 
ax’,a,x € R. Then {a e (0,1]: Qa has positive Lyapounov exponent} is undecidable over 
x. The Lyapounov exponent measures the growth rate of iterates of the map Qa, and its 
positivity is manifest in the “chaos” of the dynamical system given by iterations of Qa. 


5. Complexity Classes and Theory over a Ring R 


Following the classical tradition, we measure time (or cost) as a function of input word 
size. Suppose x € R”. We define size(x) to be the vector length of x, thus size(x) = n if x 
e R”. For machine M over R and input x, we define T(x) to be the number of nodes 
traversed from input to output when M is input x. Tis our measure of time or cost. So, if 
R = %3, size(x) is the bit size of x, and Ty(x) is the bit cost of the computation. 


Definition. (X, Xyes) € Pr (class P over R) if there exists a decision machine M and a 
polynomial p e [x] such that for each x € X, T(x) < p(size(x)). M is called a 
polynomial time (decision) machine. 


Definition. (X, Xyes) € NPr (class NP over R) if there exists (Y, Yyes) € Pr and a 
polynomial p € Z[x] such that, for each x € X, 
x € Xyes & J witness w € RP such that (x,w) € Yyes. 


In the above, we generally require that (R”, X) € Pr. Then, if R = 2, we recover the 
classical complexity classes, that is class P over %2 is the classical class P, and class NP 
over “7 is the classical class NP. Whenever we omit subscripts in the following, we are 
referring to the case when R = 77, the classical case. 


Example. HN € NPR. To see this let 


Y = {(f, 6) | f=(fi,....fm), fi € R[X1,...,.Xn], GER”, m, n > 0} and 
Yves = {(£, ©) € Y| (Ci... Ga) = 0, i= 1,...,m}. 


Then, since polynomial evaluation is a polynomial time computation over R,'* we have 
(Y,Yyes) € Prover R. Also, given f e X, f= (fi,...,fm), fi € R[X1,....Xnl, 
f € Xyes <> I GER” such that f\(C1,..., Cn) = 0, i = 1,...,m, that is, (f, C) € Yyes. 


We note that there is an exponential time decision machine for HN over “2, but as 
mentioned earlier, we do not know if HN e P. On the other hand, since HN» is not 
decidable (over ”), it certainly is not in Pz, thus Pz # NP». 


14 Tt should be clear what formal definitions we are supposing here. The proof requires a construction. 
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5.1 Main Results 
Theorem [Cook71, Levin73]. P = NP <= SATe P. 


Theorem [Karp72]. P= NP <= TSP“e PS X eP, 
where X = Hamilton Circuit or any of 19 other problems. 


Theorem [BSS89]. Pr = NPr & HNere Pr 
for R = Z2, 1}, C, or for any field R, unordered or ordered. 


To prove this theorem, we show that given any problem (X, Xyes) € NPR and instance 
xeX, we can code x (in polynomial time) as a polynomial system fx over R such that 

X € Xyes <> f has a zero over R. In other words, we give a polynomial time reduction 
from any problem in class NPR into HNg. This is done by writing down the equations for 
the computing endomorphism of an NPr machine. 


So, HNR is a universal NP-complete problem. We know that HNr € EXPe for R =} and 
C. That is, there are exponential time algorithms for deciding the solvability of 
polynomial systems over and C [Renegar91, Renegar92, BPR96]. But, again, no 
polynomial time algorithms are known. 


So, in addition to the classical P = NP? question, we pose two new ones: 
Is Pp = NP»? Is P eS NP. ? 


Understanding the complexity of the Hilbert Nullstellensatz thus plays a central role in 
complexity theory over R.'° 


5.2 Transfer Principles 
In the preface to Complexity and Real Computation [BCSS98], Dick Karp speculates 
about the transferability of complexity results from one domain to another: 


It is interesting to speculate as to whether the questions of whether Py = NP» and 
whether Pe = NP are related to each other and to the classical P versus NP 
question. ... I am inclined to think that the three questions are very different and 
need to be attacked independently. ...1” 


One reason for the skepticism is that over 1 or C one can quickly build numbers of large 
magnitude, for example by successive squaring. And over X or C, arithmetic operations 


15 The Traveling Salesman Problem (TSP) is generally stated as a search problem: Find the shortest 
(cheapest) path traversing all nodes. To view TSP as a decision problem, we introduce bounds: Given k, is 
there a path of length (cost) at most k traversing all nodes? 

16 Towards this goal, a sequence of five papers by Mike Shub and Steve Smale on the related Bezout’s 
Theorem presents a comprehensive analysis of the complexity of solving polynomial systems 
approximately and probabilistically [ShubSmale93a,93b,93c,94,96]. Related also is the work on straight 
line programs [GHMMP98]. 

7 Emphasis, mine. 
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on numbers of any magnitude can be done in one step. Hence, polynomial time machines 
over or C might be able to decide inherently hard discrete problems by “cheating”, e.g. 
by quickly coding up an exponential amount of information within large numbers and 
subsequently getting an exponential amount of essential bit operations accomplished 
quickly. 


In our book we present a number of transfer results for complexity theory, one of which 
transfers the Pr = NPR? question across algebraically closed fields of characteristic 0. 


Theorem [BCSS96]. Po = NPc <= Pg= NPx, where K is any algebraically closed of 
characteristic 0, for example the field of algebraic numbers. 


The transfer of the Pr = NPR? problem from the algebraic numbers to the complexes had 
been proved earlier by Christian Michaux, using model theoretic techniques 
[Michaux94]. The other direction uses number theory. Here we show how a machine 
over C with built in algebraically independent constants can be simulated by a machine 
without such constants and that takes the same amount of time (up to a polynomial 
factor) to compute. We call this result the elimination of constants. The constants in a 
machine come into play at branch nodes where a decision is to be made as to whether or 
not the current x;, which is a polynomial in the machine constants, is equal to 0. This 
polynomial is not presented to us in the standard form, but rather by a composition of the 
polynomials along the computation path, so we cannot in general tell quickly enough if 
the coefficients are all zero. Instead, we use the Witness Theorem to quickly generate 
algebraic witnesses with the property, if the original constants are replaced by these 
witnesses, and the resulting evaluation is 0, then the original polynomial is 0. (The theory 
of heights comes in to play here.) 


Shortly after our book went to press, Steve Smale realized (after talking to Manuel Blum) 
that standard computer science arguments could yield a transfer result from © to the 
classical setting [Smale2000]. 


Let BPP be the class of problems over 2 that can be solved in bounded error (< 1⁄2) 
probabilistic polynomial time. Class BPP is a practical modification of class P. Repeating 
a BPP algorithm k times produces a polynomial time algorithm with probability of error 
< 1/2", For example, BPP algorithms for Primality (testing) were known and used, well 
before it was known that Primality was in class P. 


Theorem. P-=NP:-=> BPP > NP. 


The idea of the proof goes as follows. First we note that by adding polynomials of the 
form x(x-1) to an instance f e HN we get an equivalent instance f* € HNc. Then, any 
polynomial decision machine M over C for HN. will decide the solvability of f* and 
hence the solvability of f over “2. The trouble is, M might have a finite number of built in 
complex constants. As before, they come into play at branch nodes where a decision is to 
be made as to whether or not a polynomial in these constants, presented by composition, 
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is equal to 0. Rather than generate witnesses as before to eliminate constants, the decision 
is now made by probabilistically replacing these constants by a small number (by 
Schwartz’s Lemma) of small numbers (by the Prime Number Theorem). Thus, given a 
polynomial time machine M over Č for HN«, we could construct a probabilistic 
polynomial time machine for HN. 


Indeed this argument, although not this conclusion, had been used by Pascal Koiran in his 
1997 paper [Koiran97/93]. (See also, [CSS94].) 


Transfer results provide important connections between the two approaches to 
computing. z Underlying connections derive from the uniform distribution of rational 
data of bounded input length with respect to the probability distribution of condition 
numbers of numerical analysis [CMPS02]. 


6. Introducing Condition, Accuracy and Round-off into Complexity 
Theory: where Turing meets Newton 


We now return to the Turing paper on rounding-off errors referred to earlier eee 


- ROUNDING-OFF ERRORS IN MATRIX. PROCESSES -| 
By A.M. TURING pe 
(National Physical Laboratory, Teddington, M iddlesex) 
7 . [Received 4 Now ember’ 1947] - 


SUMMARY 

A number of methods of solving sets of linear equations and inverting 1 Tnatrices 
‘are discussed: ‘The. theory of the rouniding-off errors involved is investigated’ for apn 
some of the methods. In all cases examined, including the well-known ‘Ghuss ` 
elimination: process’, it is found that the errors are normally gite moderate: no i 
exponential build-up need occur. ` cee 

Inchided amongst the mothods considered i isa paren ere ey of Choleski' 3. method : 
which appears to have advantages over other known ‘methods both as regards . 
accuracy and convenience. This method may also. be regarded as a rearrangement 
of the elimination process. . 


Tus paper contains descriptions of. a number of methods, for abie aia 
of linear simultaneous equations and for inverting matrices; but its main 
‘goncern is‘with the theoretical limits.of accuracy that: may be, obtained i in 
the application of these methods, due to rounding-off errors. 


"MAE a 


“yee 


From The Quarterly Journal of Mechanics and Applied Mathematics, vol. 1, 1948, p. 287. 


It is here that the notion of condition (of a linear system) was originally introduced. An 
illustrative example is presented on page 297: 


18 Transfer results are also known for questions regarding other complexity classes such as PSPACE 
[Koiran02]. 
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(8.1) 1.4x+ 09y = 27 
OB 1.7y = -1.2 

(8.2) -0.786x + 1.709y = -1.173 
-0.8 x + 1.7 y = -1.2 


The set of equations (8.2) is fully equivalent to (8.1)!%, but clearly if we 
attempt to solve (8.2) by numerical methods involving rounding-off 
errors we are almost certain to get much less accuracy than if we worked 
with equations (8.1). ... 


We should describe the equations (8.2) as an ill-conditioned set, or, at any 
rate, as ill-conditioned compared with (8.1).2° It is characteristic of ill- 
conditioned sets of equations that small percentage errors in the 
coefficients given may lead to large percentage errors in the solution. 


Turing’s notion of condition, clearly inspired by Newton’s derivative, links both 
traditions, particularly when considering questions of complexity. 


6.1. Condition Numbers and Complexity 

The condition of a problem (instance) measures how small perturbations of the input will 
alter the output. Introducing the notion of condition provides an important link between 
the two traditions of computing. 


input x output ọ(x) 


x + Ax 
o(x + Ax) 


With appropriate norm, the ratio || @(@ + Ax) - o(&)||/||Ax||, or relative ratio 

(|| o@< + Ax) - O~%X)||/||@Cx)||)/C|Ax||/|Lx]|) indicates the condition of an instance. If the quotient 
is large, the instance is i/l-conditioned so requires more accuracy, and hence more resources, 
to compute with small error. 


Definition (Turing). Suppose A is ann x n real matrix and b € 2". The condition 
number of the linear system, Ax = b, is given by x(A) = ||Al] ||A“|||. Here ||A]| = 
max {|Ay|/ly| |y # 0} is the operator norm with respect to the Euclidean norm | |. 


We note that «(A) is the worst case relative condition for solving the system Ax = b for x. 
Thus log’«(A) provides a worst case lower bound for the loss of precision in solving the 


1? The third equation is the second plus .01 times the first. 
° Emphasis, mine. 
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system.”' For computational purposes, ill-conditioned problem instances will generally 
require more input precision than well conditioned instances. 


During the 1980’s a number of people [BlumShub86, Kostlan88] gave estimates on the 
average loss of precision for (solving) linear systems over x 


Theorem [Edelman88]. Average log’«(A) ~ log n 


The log of condition provides an intrinsic parameter to replace arbitrarily chosen bit input 
word sizes for problem instances where the underlying mathematical spaces are © or C. 
And so a focus for complexity theory is to formulate and understand measures of 
condition. 


My favorite example for illustrating the issues raised, and the resolutions proposed, is the 
Linear Programming Problem over R (LPP) alluded to earlier where R is Q or X. An 
instance of the LPPp is to maximize a linear function c - x subject to the constraints Ax < 
b, x = 0, , or to conclude no such maximum exits. The data here is (A, b, c) where A is an 
m x n matrix over R, b e R” and c e R”. 


Competing algorithms for solving the LPP, are often posed, and analyzed, using distinct 
models of computation. (Also, there are various equivalent mathematical formulations of 
the LPPa, not necessarily equivalent with respect to complexity theory.) 


The simplex method for the LPP optimizes by traversing vertices. The newer interior point 
methods follow a trajectory of centers. 


The simplex algorithm [Dantzig47/90], a long time method of choice, is an algebraic 
algorithm which in the worst case takes exponential (in n and m) number of steps 
[KleeMinty72] given exact arithmetic. For rational inputs, simplex also is exponential in 
the input word size (given in bits) in the bit model. 


*! log’x = log x for x > 1, otherwise log*x = 0. 
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On the other hand, for rational inputs, the ellipsoid algorithm [Khachiyan79] is 
polynomial time in the input word size in the bit model. But, as an algorithm over I}, i.e. 
allowing exact arithmetic, it is not finite in general [TraubWoz82, Blum90]. The same is 
true of the newer interior point algorithms.” 


It would seem more natural, and appropriate, to analyze the complexity of algorithms for 
the LPPe with respect to an intrinsic input word size. Hence we are motivated to define a 
measure of the condition of a linear program [Blum90]. Jim Renegar [Renegar95] was 
the first to propose such a condition number in this context. His definition is inspired by 
the 1936 theorem of Eckart and Young.” 


Theorem [EckartYoung36]. For a real matrix A, «(A) ~1/dp(A, £) where © is space of 
ill-posed problem instances, i.e. X is the space of non-invertible matrices. (Here the 
relative distance dg is with respect to the Frobenius norm ||A\\p = sq.rt.(È aij ).) 


r -À 


We now consider the linear programming problem over in the form: 
Given Ax = b, x = 0, find a feasible solution or declare there is none. 


Definition [Renegar95]. The condition number of a linear program over |. with data 
(A,b) is given by C(A, b) = ||(A,b)||/d((A,b),2 mn). (Here, Xmn is the boundary of the 
feasible pairs (A,b) and both the operator norm || || and the distance d are with respect to 
the respective Euclidean norms.) 


Renegar proposes an interior point algorithm and analyzes it with respect to parameters: 
n, m, the loss of precision, and the desired accuracy of solution. 


Theorem [Renegar Interior Point Algorithm].”’ If the linear program is feasible, the 
number of iterations to produce an s-approximation to a feasible point is polynomial in n, 
m, log*C(A,b) and |logel. 


Felipe Cucker and Javier Pefia propose an algorithm and add round-off error as a 
parameter for the complexity analysis [CuckerPefia02]. 


*? We emphasize this distinction. Simplex is a finite algorithm over 1} and over the rationals in both the 
exact arithmetic model and the bit model. The newer interior point algorithms (e.g. [Karmarkar84, 
Renegar88]) are finite only over the rationals in the bit model, not over |. 

2 Jim Demmel also was thus inspired to define and study condition numbers for 1-variable polynomials 
[Demmel87]. 

%4 By giving the simplest results here, I am hardly doing justice to the full extent of Renegar’s and others’ 
work in this area but hope this discussion will spur the reader to investigate this area fuller. 
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Theorem [Cucker-Pefa Algorithm with Round-Off]. If the linear program is feasible, 
the bit cost to produce an £ -approximation to a feasible point is 
O((m+n)**(log(m+n)Hog*C(A,b)+|log £ N°). The finest precision required is a round-off 
unit of 1/c(mt+n)'?C(A,b)”. 


While condition, approximation and round-off help bridge the combinatorial and 
continuous approaches to the design and analyses of linear programming algorithms, 
basic connections and complexity questions remain open. 


In particular: Is LPP; € PR? Even more, is LPP strongly polynomial? That is, is there a 
polynomial time algorithm over |. for LPP» that is also polynomial time with respect to 
bit cost on rational input data.”° 


In conclusion, I have endeavored to give an idea how machines over the reals --tempered 
with condition, approximation, round-off and probability°-- enable us to combine tools 
and traditions of theoretical computer science with tools and traditions of numerical 
analysis to help better understand the nature and complexity of computation. 
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